2,037 research outputs found

    Evidence attribution in the UniProt Knowledgebase

    Get PDF
    UniProtKB provides the scientific community with a comprehensive collection of protein sequence records containing extensive curated information including functional and sequence annotation. This information is derived from a variety of sources such as scientific literature and sequence analysis programs as well as data imported from automatic annotation systems and external databases. To allow users to ascertain the origin of each data item in a UniProtKB record, an evidence attribution system is being introduced which links each piece of information to its original source. This system allows users to trace the origin of all information, to differentiate easily between experimental and computational data, and to assess data reliability. The current system and plans for its future development and enhancement will be presented

    UniProt in RDF: Tackling Data Integration and Distributed Annotation with the Semantic Web

    Get PDF
    The UniProt knowledgebase (UniProtKB) is a comprehensive repository of protein sequence and annotation data. We collect information from the scientific literature and other databases and provide links to over one hundred biological resources. Such links between different databases are an important basis for data integration, but the lack of a common standard to represent and link information makes data integration an expensive business. At UniProt we have started to tackle this problem by using the Resource Description Framework ("http://www.w3.org/RDF/":http://www.w3.org/RDF/) to represent our data. RDF is a core technology for the World Wide Web Consortium's Semantic Web activities ("http://www.w3.org/2001/sw/":http://www.w3.org/2001/sw/) and is therefore well suited to work in a distributed and decentralized environment. The RDF data model represents arbitrary information as a set of simple statements of the form subject-predicate-object. To enable the linking of data on the Web, RDF requires that each resource must have a (globally) unique identifier. These identifiers allow everybody to make statements about a given resource and, together with the simple structure of the RDF data model, make it easy to combine the statements made by different people (or databases) to allow queries across different datasets. RDF is thus an industry standard that can make a major contribution to solve two important problems of bioinformatics: distributed annotation and data integration

    Xenopus and Zebrafish Annotation in the UniProt Knowledgebase (UniProtKB)

    Get PDF
    The African clawed frog Xenopus laevis and the zebrafish Danio rerio have both proved to be good model organisms for studying early vertebrate cellular and developmental biology. More recently, the related western clawed frog Xenopus tropicalis has become a popular choice in the laboratory, since its shorter life style and diploid genome make it more amenable to genetic analysis. Ongoing sequencing of the X. tropicalis and D. rerio genomes, together with the growing number of EST/cDNA projects, is generating large amounts of sequence data and revealing many human developmental and disease genes that have counterparts in fish and frog.

UniProtKB/Swiss-Prot curates Xenopus and zebrafish proteins with functional and sequence annotation from the literature and sequence analysis tools, using both controlled vocabularies (including GO terms) and free text. The tetraploid nature of the X. laevis and D. rerio genomes complicates annotation since the protein copies need to be identified and curated as separate UniProtKB/Swiss-Prot entries. The recent addition of Xenbase cross-references in Xenopus UniProtKB entries has been the result of cross-talk with Xenbase, and we continue to collaborate with ZFIN to ensure consistency between databases. 

UniProt is mainly supported by the NIH, European Commission FELICS, Swiss Federal Government, PATRIC BRC and NSF grants.
&#xa

    UniProt Knowledgebase: a hub of integrated data

    Get PDF
    Data integration plays an increasingly important role in bringing together the large amounts of diverse information spread across disparate resources and presenting a comprehensive overview of these data to the scientific community. The UniProt Knowledgebase (UniProtKB) acts as a central hub of protein knowledge by providing a unified view of protein sequence and functional information. Manual and automatic annotation procedures are used to add data directly to the database while extensive cross-referencing to more than 120 external databases provides access to additional relevant information in more specialised data collections. UniProtKB also integrates data such as protein sequences, protein-protein interactions, Gene Ontology terms and official gene nomenclature from a range of resources. All information in UniProtKB is attributed to its original source, allowing users to trace the provenance of all data. In addition, UniProtKB data is made freely available in a range of formats to facilitate integration with other databases and the UniProt Consortium is committed to using and promoting common data exchange formats and technologies. This approach ensures that information is captured in the most appropriate resource for subsequent integration with other databases and also ensures maximum curation efficiency by preventing duplication of efforts across multiple resources. How UniProt achieves this data capture and integration will be presented. The UniProt resource is available at "www.uniprot.org":http://www.uniprot.org

    UniProtKB amid the turmoil of plant proteomics research

    Get PDF
    The UniProt KnowledgeBase (UniProtKB) provides a single, centralized, authoritative resource for protein sequences and functional information. The majority of its records is based on automatic translation of coding sequences (CDS) provided by submitters at the time of initial deposition to the nucleotide sequence databases (INSDC). This article will give a general overview of the current situation, with some specific illustrations extracted from our annotation of Arabidopsis and rice proteomes. More and more frequently, only the raw sequence of a complete genome is deposited to the nucleotide sequence databases and the gene model predictions and annotations are kept in separate, specialized model organism databases (MODs). In order to be able to provide the complete proteome of model organisms, UniProtKB had to implement pipelines for import of protein sequences from Ensembl and EnsemblGenomes. A single genome can be the target of several unrelated sequencing projects and the final assembly and gene model predictions may diverge quite significantly. In addition, several cultivars of the same species are often sequenced ā€“ 1001 Arabidopsis cultivars are currently under way ā€“ and the resulting proteomes are far from being identical. Therefore, one challenge for UniProtKB is to store and organize these data in a convenient way and to clearly defined reference proteomes that should be made available to users. Manual annotation is one of the landmarks of the Swiss-Prot section of UniProtKB. Besides adding functional annotation, curators are checking, and often correcting, gene model predictions. For plants, this task is limited to Arabidopsis thaliana and Oryza sativa subsp. japonica. Proteomics data providing experimental evidences confirming the existence of proteins or identifying sequence features such as post-translational modifications are also imported into UniProtKB records and the knowledgebase is cross-referenced to numerous proteomics resource

    Manual Curation of Vertebrate Proteins in the UniProt Knowledgebase.

    Get PDF
    The UniProt Knowledgebase (UniProtKB) aims to provide the scientific community with a comprehensive, consistent and authoritative resource for protein sequence and functional information. Given the importance of human and vertebrate model data in biomedical research, a major focus is the high-quality manual curation of human proteins and their vertebrate orthologues. Manual curation involves (1) the extraction of experimental results from scientific literature to enrich protein records with a wide range of information including function, structure, interactions and subcellular location, (2) the manual verification of each sequence and clarification of discrepancies between sequence reports, and (3) the assessment of the output of a range of analysis programmes to ensure that sequence features are correctly reported. Manual curation also facilitates the standardization of experimental data – a step necessary for development of methods that enable the semi-automated transfer of manual annotation to uncharacterised or related proteins. Consequently, manual curation of vertebrate proteins plays a vital role in providing users with a complete overview of available data while ensuring its accuracy, reliability and accessibility. UniProtKB/Swiss-Prot currently contains the complete manually reviewed human proteome, comprising approximately 20’300 proteins, and an additional 61’000 reviewed entries from model vertebrates such as mouse, rat, apes, cow, chicken, zebrafish and Xenopus. Ongoing efforts continue to improve the quality of vertebrate sequences in collaboration with HAVANA, Ensembl, HGNC and RefSeq, to include new functional information as it becomes available, and to extend the coverage of curated proteins in vertebrate species. All data are freely available from "http://www.uniprot.org":www.uniprot.org

    Experimental data from flesh quality assessment and shelf life monitoring of high pressure processed European sea bass (Dicentrarchus labrax) fillets

    Get PDF
    Fresh fish are highly perishable food products and their short shelf-life limits their commercial exploitation and leads to waste, which has a negative impact on aquaculture sustainability. New non-thermal food processing methods, such as high pressure (HP) processing, prolong shelf-life while assuring high food quality. The effect of HP processing (600MPa, 25Ā Ā°C, 5min) on European sea bass (Dicentrarchus labrax) fillet quality and shelf life was investigated. The data presented comprises microbiome and proteome profiles of control and HP-processed sea bass fillets from 1 to 67 days of isothermal storage at 2Ā Ā°C. Bacterial diversity was analysed by Illumina high-throughput sequencing of the 16S rRNA gene in pooled DNAs from control or HP-processed fillets after 1, 11 or 67 days and the raw reads were deposited in the NCBI-SRA database with accession number PRJNA517618. Yeast and fungi diversity were analysed by high-throughput sequencing of the internal transcribed spacer (ITS) region for control and HP-processed fillets at the end of storage (11 or 67 days, respectively) and have the SRA accession number PRJNA517779. Quantitative label-free proteomics profiles were analysed by SWATH-MS (Sequential Windowed data independent Acquisition of the Total High-resolution-Mass Spectra) in myofibrillar or sarcoplasmic enriched protein extracts pooled for control or HP-processed fillets after 1, 11 and 67 days of storage. Proteome data was deposited in the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifiers PXD012737. These data support the findings reported in the associated manuscript "High pressure processing of European sea bass (Dicentrarchus labrax) fillets and tools for flesh quality and shelf life monitoring", Tsironi etĀ al., 2019, JFE 262:83-91, doi.org/10.1016/j.jfoodeng.2019.05.010.FCT (Foundation of Science and Technology) COFASP/0002/2015; Portuguese Foundation for Science and Technology UID/Multi/04326/2019 POCI-01-0145-FEDER007440 UID/NEU/04539/2019info:eu-repo/semantics/publishedVersio

    GONUTS: the Gene Ontology Normal Usage Tracking System

    Get PDF
    The Gene Ontology Normal Usage Tracking System (GONUTS) is a community-based browser and usage guide for Gene Ontology (GO) terms and a community system for general GO annotation of proteins. GONUTS uses wiki technology to allow registered users to share and edit notes on the use of each term in GO, and to contribute annotations for specific genes of interest. By providing a site for generation of third-party documentation at the granularity of individual terms, GONUTS complements the official documentation of the Gene Ontology Consortium. To provide examples for community users, GONUTS displays the complete GO annotations from seven model organisms: Saccharomyces cerevisiae, Dictyostelium discoideum, Caenorhabditis elegans, Drosophila melanogaster, Danio rerio, Mus musculus and Arabidopsis thaliana. To support community annotation, GONUTS allows automated creation of gene pages for gene products in UniProt. GONUTS will improve the consistency of annotation efforts across genome projects, and should be useful in training new annotators and consumers in the production of GO annotations and the use of GO terms. GONUTS can be accessed at http://gowiki.tamu.edu. The source code for generating the content of GONUTS is available upon request
    • ā€¦
    corecore